Restoration of images scanned from thick bound documents
نویسندگان
چکیده
Perspective distortion always occurs while scanning thick, bound documents. This distortion mainly causes two sources of degradation for the scanned grayscale image – i) shade along the ‘spine’ of the book, and ii) warp of the words in the shade. In this paper, we propose a restoration system to solve these two problems. Our system first produces a vertical projection profile to detect which side of the image the shade lies on, and a run-length method is used to find the boundary between the shade and the clean area. We then apply a modified Niblack’s method to remove the shade. We use a connected component analysis to adjust the location and orientation of the warped word in the shade area, i.e. the words within the boundary detected earlier. The implementation results are presented in this paper. Our system will be used in a text retrieval project for National Archives of Singapore.
منابع مشابه
Recovery of Distorted Document Images from Bound Volumes
Recovery of document images scanned from thick bound volumes is necessary for the purpose of human reading and text retrieval. The main problem with scanning of bound volumes is that there always occurs perspective distortion. Such distortion causes two sources of degradation for the scanned images – 1) shadow at the book spine area, and 2) warping of the words in the shadow. In this paper, we ...
متن کاملStraightening warped text lines using polynomial regression
Perspective distortion always occurs while scanning thick, bound documents, resulting in two problems in the scanned grayscale image – (i) shade along the ‘spine’ of the book, and (ii) warping of words in the shade area. We proposed a restoration system to solve these two problems in our previous paper [1]. However the shape of the warped words was not fully restored, since we simply shifted an...
متن کاملCurvature Correction and Shadow Images of Scanned Documents based on Boundary Lines of Text and Brightness Estimation Function
While the pages of a book or thick document to be scanned, two types of geometric and optical destruction for the scanned images arise. As a result of the damages, hick curvy lines of text in the document find and book binding are shaded. This problem makes it difficult to read. In the near future, it is an attempt to correct such images. In this paper, we review the methods for correcting the ...
متن کاملBleed-through removal in degraded documents
This paper presents a linear-based restoration method for bleed-through degraded document images and uses a Bayesian approach for bleed-through reduction. A variation of iterated conditional modes (ICM) optimisation is used whereby samples are drawn for the clean image estimates, whilst the remaining variables are estimated via the mode of their conditional probabilities. The proposed method is...
متن کاملA FILTERED B-SPLINE MODEL OF SCANNED DIGITAL IMAGES
We present an approach for modeling and filtering digitally scanned images. The digital contour of an image is segmented to identify the linear segments, the nonlinear segments and critical corners. The nonlinear segments are modeled by B-splines. To remove the contour noise, we propose a weighted least q m s model to account for both the fitness of the splines as well as their approximate cur...
متن کامل